Building a Microblog Corpus for Search Result Diversification
نویسندگان
چکیده
Queries that users pose to search engines are often ambiguous either because different users express different query intents with the same query terms or because the query is underspecified and it is unclear which aspect of a particular query the user is interested in. In the Web search setting, search result diversification, whose goal is the creation of a search result ranking covering a range of query intents or aspects of a single topic respectively, has been shown in recent years to be an effective strategy to satisfy search engine users. We hypothesize that such a strategy will also be beneficial for search on microblogging platforms. Currently, progress in this direction is limited due to the lack of a microblog-based diversification corpus. In this paper we address this shortcoming and present our work on creating such a corpus. We are able to show that this corpus fulfils a number of diversification criteria as described in the literature. Initial search and retrieval experiments evaluating the benefits of de-duplication in the diversification setting are also reported.
منابع مشابه
Improving Microblog Retrieval from Exterior Corpus by Automatically Constructing Microblogging Corpus
A large-scale training corpus consisting of microblogs belonging to a desired category is important for highaccuracy microblog retrieval. Obtaining such a large-scale microblgging corpus manually is very time and laborconsuming. Therefore, some models for the automatic retrieval of microblogs from an exterior corpus have been proposed. However, these approaches may fail in considering microblog...
متن کاملImproving Microblog Retrieval from Exterior Corpus by Automatically Constructing a Microblogging Corpus
A large-scale training corpus consisting of microblogs belonging to a desired category is important for highaccuracy microblog retrieval. Obtaining such a large-scale microblgging corpus manually is very time and laborconsuming. Therefore, some models for the automatic retrieval of microblogs from an exterior corpus have been proposed. However, these approaches may fail in considering microblog...
متن کاملLanguage Differences and Metadata Features on Twitter
In the past several years, microblogging services like Twitter and Facebook have become a popular method of communication, allowing users to disseminate and gather information to and from hundreds or thousands (or even millions) of people, often in real-time. As much of the content on microblogging services is publicly accessible, we have recently seen many secondary services being built atop t...
متن کاملRMIT at TREC 2011 Microblog Track
This paper describes our submission to the TREC 2011 microblog task. For the experiments, we use our new self-index search engine, NeWT, to support ranked search in the Twitter document corpus. We use a combination of phrase queries and degrading conjunctive Boolean intersection to improve retrieval effectiveness. Keywords-self-index; full-text search, phrases, threshold; intersection
متن کاملTime-Aware Latent Concept Expansion for Microblog Search
Incorporating the temporal property of words into query expansion methods based on relevance feedback has been shown to have a significant positive effect on microblog search. In contrast to such word-based query expansion methods, we propose a concept-based query expansion method based on a temporal relevance model that uses the temporal variation of concepts (e.g., terms and phrases) on micro...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013